fix(process): give backend workers a parent-death safety net by localai-bot · Pull Request #10639 · mudler/LocalAI

localai-bot · 2026-07-01T23:37:13Z

Symptom

A backend model-worker subprocess — the per-model gRPC server LocalAI spawns (e.g. a llama.cpp/whisper/etc. worker) — can be orphaned and linger, holding VRAM and its listen port, if the LocalAI process itself is killed non-gracefully (for example, a supervising process's graceful-shutdown grace period elapses and LocalAI is SIGKILLed) before LocalAI's own teardown runs. This was hit by a downstream project that supervises LocalAI as a child process.

Root cause

LocalAI does have a working-by-design graceful teardown:

pkg/signals/handler.go installs signal.Notify(c, SIGINT, SIGTERM), runs registered handlers, then exits.
The serve path registers app.Shutdown() (core/cli/run.go), which calls ModelLoader.StopAllGRPC() → process.Stop() (pkg/model/process.go).

That teardown only runs if LocalAI receives a catchable signal and survives long enough to run its handlers. If LocalAI is SIGKILLed, none of it runs.

Backends are spawned via github.com/mudler/go-processmanager v0.1.1. Its getSysProcAttr() (in the library's process_unix.go) sets Setpgid: true — intentional, so the graceful path can signal the backend's whole process group — but it never sets PR_SET_PDEATHSIG/Pdeathsig, and the library exposes no Config field or functional option to inject/extend SysProcAttr. LocalAI fully delegates spawning to that library (pkg/model/process.go calls process.New(...).Run(); it never builds the exec.Cmd itself), so LocalAI cannot set a kernel parent-death signal at the spawn site. When LocalAI dies without cleaning up, the backend is reparented to init and keeps running. There is no fallback that makes an orphaned backend self-terminate.

Fix

Add a best-effort, backend-side safety net that detects reparenting: on startup each backend captures getppid() and polls it; when the process is reparented (getppid changes / becomes 1 — the standard POSIX signal that the original parent has died) it logs and self-terminates. getppid() detection is portable across Linux + macOS, unlike Linux-only PR_SET_PDEATHSIG (which also has a false-positive with a Go parent: the signal fires when the spawning thread exits, which the Go runtime may retire while the process lives).

The same mechanism, env vars and semantics are now applied across all three backend languages LocalAI ships:

Go — pkg/grpc/parentwatch.go, armed in the shared grpc.StartServer / grpc.RunServer choke point that every out-of-process Go backend routes through.
C++ — backend/cpp/llama-cpp/parent_watch.h, a dependency-free header wired into grpc-server.cpp's main() (and copied at build time via prepare.sh).
Python — backend/python/common/parent_watch.py, armed from common/grpc_auth.py's get_auth_interceptors() — the single shared helper every Python backend invokes while building its gRPC server.

Shared configuration (identical across all three):

LOCALAI_BACKEND_PARENT_WATCH — default on; falsy values false/0/no/off (case-insensitive) disable it; automatically off on Windows (different reparenting semantics).
LOCALAI_BACKEND_PARENT_WATCH_INTERVAL — poll interval, default 2s; accepts Go-style durations (500ms, 2s, 1m) in every language for parity.
Skips entirely when already orphaned at startup (getppid() <= 1).

This is strictly a backstop alongside the existing graceful SIGTERM → grace → SIGKILL teardown, which is unchanged in all three languages. No shutdown timing, GracefulTimeout, or IsBusy() polling was touched.

Test coverage

Each language has a real process-tree reparent test (test → middle → grandchild): the middle process exits to orphan the grandchild (running the real watcher), and the test asserts the watcher detects the reparent and self-terminates.

Go — pkg/grpc/parentwatch_test.go:

$ go test ./pkg/grpc/ -run TestParentDeathWatcherDetectsReparent -v -count=1
=== RUN   TestParentDeathWatcherDetectsReparent
--- PASS: TestParentDeathWatcherDetectsReparent (0.06s)
PASS
ok  	github.com/mudler/LocalAI/pkg/grpc	0.069s

C++ — backend/cpp/llama-cpp/parent_watch_test.cpp (uses fork(2); standard library only, so it runs via the existing standalone backend/cpp/run-unit-tests.sh runner — no CUDA/gRPC build needed; also buildable under ctest with -DLLAMA_GRPC_BUILD_TESTS=ON):

$ bash backend/cpp/run-unit-tests.sh
==> backend/cpp/llama-cpp/parent_watch_test.cpp
ok:   interval default 2000ms
ok:   interval 500ms / 2s / 1m / bare-3 / garbage-fallback
ok:   enabled by default; disabled by false/0/no/off/OFF/' False '
ok:   grandchild signaled readiness
ok:   watcher detected parent death and self-terminated
All parent_watch tests passed.
Ran 2 standalone C++ unit test file(s)   # exit 0

(The full backend build needs the llama.cpp + gRPC toolchain, so the watcher is verified by compiling and running its own translation unit standalone — the header is intentionally dependency-free precisely so this is possible.)

Python — backend/python/common/parent_watch_test.py (uses os.fork; standard library only):

$ cd backend/python/common && python3 -m unittest parent_watch_test -v
test_detects_reparent ... ok
test_disabled_by_falsey ... ok
test_enabled_by_truthy ... ok
test_enabled_default ... ok
test_interval_default ... ok
test_interval_garbage_falls_back ... ok
test_interval_units ... ok
Ran 7 tests in 0.062s
OK

Known limitations / follow-ups (not overclaiming)

C++ coverage is the llama-cpp backend only. C++ backends have no shared server scaffolding (each backend/cpp/*/grpc-server.cpp has its own main/RunServer), so the watcher was added to the originally-reported, most-used backend (llama.cpp). The other C++ backends — ds4, ik-llama-cpp, privacy-filter — are not yet covered; each would need the same one-line #include "parent_watch.h" + start_parent_death_watcher() as a follow-up (the header is reusable as-is).
Python coverage is all backends via the shared common/ choke point, with no per-backend edits.
The fully general fix would be for go-processmanager to expose SysProcAttr injection so LocalAI can set Pdeathsig at spawn for every backend regardless of language. That is a change to a separate repo and is intentionally out of scope for this LocalAI-only PR — suggested as an upstream follow-up.
Windows is not covered in any language (different reparenting model); the watcher is a no-op there.

🤖 Generated with Claude Code

…fully Symptom: a backend model-worker subprocess (the per-model gRPC server LocalAI spawns) can be orphaned and linger — holding VRAM and its listen port — if the LocalAI process is killed non-gracefully (e.g. a supervisor's graceful-shutdown grace period elapses and LocalAI is SIGKILLed) before its own teardown runs. Root cause: LocalAI's graceful teardown (pkg/signals/handler.go installs the SIGINT/SIGTERM handler; core/cli/run.go registers app.Shutdown -> ModelLoader.StopAllGRPC -> process.Stop in pkg/model/process.go) only runs when LocalAI receives a catchable signal and survives long enough to run its handlers. Backends are spawned via github.com/mudler/go-processmanager v0.1.1, whose getSysProcAttr() sets Setpgid:true (own process group, so the group can be signalled) but never PR_SET_PDEATHSIG/Pdeathsig, and exposes no Config field or option for a caller to inject/extend SysProcAttr. LocalAI fully delegates spawning to that library (it never builds the exec.Cmd itself), so it cannot set a kernel parent-death signal at the spawn site. If LocalAI is SIGKILLed, nothing tells the backend to exit and it is reparented to init. Fix: add a best-effort, backend-side safety net at the one shared choke point every out-of-process Go backend routes through — grpc.StartServer / RunServer in pkg/grpc. On startup it captures getppid() and polls; when the process is reparented (getppid changes / becomes 1 — the standard POSIX signal the original parent died) it logs and self-terminates. getppid() reparent detection is portable (Linux + macOS), unlike Linux-only PR_SET_PDEATHSIG. Toggle via LOCALAI_BACKEND_PARENT_WATCH (default on; off on Windows) and LOCALAI_BACKEND_PARENT_WATCH_INTERVAL. This is strictly a backstop alongside the existing graceful SIGTERM->grace->SIGKILL teardown, which is unchanged. Scope/limitations: covers Go-based backends (everything using pkg/grpc). The C++ backends (e.g. llama-cpp) and Python backends do not route through pkg/grpc and are not covered by this mechanism — they would each need an equivalent parent-death check (follow-up). The fully general fix is for go-processmanager to expose SysProcAttr injection so LocalAI can set Pdeathsig at spawn for every backend regardless of language (suggested upstream follow-up; out of scope for this LocalAI-only PR). Test: pkg/grpc/parentwatch_test.go builds a real test -> middle -> grandchild process tree, lets the middle process exit to orphan the grandchild running the real watchParentDeath, and asserts it detects the reparent and self-terminates. Unix-only (build-tagged), runs in CI (Linux). Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

The Go parent-death watcher (pkg/grpc/parentwatch.go, commit 772b435) only protects backends that route through pkg/grpc. C++ and Python backends don't, so the originally-reported case — the llama.cpp gRPC worker surviving a non-graceful LocalAI death — was still uncovered. Extend the same best-effort backstop to both languages, reusing the exact mechanism and semantics: - capture getppid() at startup, skip if already orphaned (<=1) - a background thread polls getppid() and self-exits on reparenting (getppid() != orig || == 1), portable across Linux/macOS, no-op on Windows - same env vars: LOCALAI_BACKEND_PARENT_WATCH (default on; falsy false/0/no/off disable) and LOCALAI_BACKEND_PARENT_WATCH_INTERVAL (default 2s; accepts Go-style durations like 500ms/2s/1m) C++: implemented in backend/cpp/llama-cpp (the reported, most-used C++ backend) as a dependency-free header parent_watch.h, wired into grpc-server.cpp's main() and copied at build time via prepare.sh. C++ backends have no shared server scaffolding, so other C++ backends (ds4, ik-llama-cpp, privacy-filter, ...) are not yet covered and would each need the same one-line include+call as follow-ups. Python: implemented once in the shared common/parent_watch.py and armed from common/grpc_auth.py's get_auth_interceptors() — the single helper every one of the 35 Python backends invokes while building its gRPC server — so all Python backends (and future ones) are covered with no per-backend edits and no duplicated implementation. Tests (real process-tree reparent detection, mirroring the Go test): - backend/cpp/llama-cpp/parent_watch_test.cpp (via run-unit-tests.sh) - backend/python/common/parent_watch_test.py (python -m unittest) Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler and others added 2 commits July 2, 2026 07:31

mudler force-pushed the fix/backend-parent-death-signal branch from 04b474b to 94e3e06 Compare July 2, 2026 07:32

mudler merged commit a4e6e01 into master Jul 2, 2026
70 checks passed

mudler deleted the fix/backend-parent-death-signal branch July 2, 2026 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(process): give backend workers a parent-death safety net#10639

fix(process): give backend workers a parent-death safety net#10639
mudler merged 2 commits into
masterfrom
fix/backend-parent-death-signal

localai-bot commented Jul 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Symptom

Root cause

Fix

Test coverage

Known limitations / follow-ups (not overclaiming)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

localai-bot commented Jul 1, 2026 •

edited

Loading